From PAC-Bayes Bounds to KL Regularization
نویسندگان
چکیده
We show that convex KL-regularized objective functions are obtained from a PAC-Bayes risk bound when using convex loss functions for the stochastic Gibbs classifier that upper-bound the standard zero-one loss used for the weighted majority vote. By restricting ourselves to a class of posteriors, that we call quasi uniform, we propose a simple coordinate descent learning algorithm to minimize the proposed KL-regularized cost function. We show that standard `p-regularized objective functions currently used, such as ridge regression and `p-regularized boosting, are obtained from a relaxation of the KL divergence between the quasi uniform posterior and the uniform prior. We present numerical experiments where the proposed learning algorithm generally outperforms ridge regression and AdaBoost.
منابع مشابه
PAC-bayes bounds with data dependent priors
This paper presents the prior PAC-Bayes bound and explores its capabilities as a tool to provide tight predictions of SVMs’ generalization. The computation of the bound involves estimating a prior of the distribution of classifiers from the available data, and then manipulating this prior in the usual PAC-Bayes generalization bound. We explore two alternatives: to learn the prior from a separat...
متن کاملPAC Classification based on PAC Estimates of Label Class Distributions
A standard approach in pattern classification is to estimate the distributions of the label classes, and then to apply the Bayes classifier to the estimates of the distributions in order to classify unlabeled examples. As one might expect, the better our estimates of the label class distributions, the better the resulting classifier will be. In this paper we make this observation precise by ide...
متن کاملTighter PAC-Bayes bounds through distribution-dependent priors
We further develop the idea that the PAC-Bayes prior can be informed by the data-generating distribution. We prove sharp bounds for an existing framework of stochastic exponential weights algorithms, and develop insights into controlling function class complexity in this model. In particular we consider controlling capacity with respect to the unknown geometry defined by the datagenerating dist...
متن کاملHands-On Learning Theory Fall 2016, Lecture 4
Recall that in Theorem 2.1, we analyzed empirical risk minimization with a finite hypothesis class F , i.e., |F| < +∞. Here, we will prove results for possibly infinite hypothesis classes. Although the PAC-Bayes framework is far more general, we will concentrate of the prediction problem as before, i.e., (∀f ∈ F) f : X → Y. Also, note that Theorem 2.1 could have been stated in a more general fa...
متن کاملHands-On Learning Theory Fall 2017, Lecture 4
Recall that in Theorem 2.1, we analyzed empirical risk minimization with a finite hypothesis class F , i.e., |F| < +∞. Here, we will prove results for possibly infinite hypothesis classes. Although the PAC-Bayes framework is far more general, we will concentrate of the prediction problem as before, i.e., (∀f ∈ F) f : X → Y. Also, note that Theorem 2.1 could have been stated in a more general fa...
متن کامل